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Summary. The behavior of complex networks under attack depends strongly on 
the specific attack scenario. Of special interest are scale-free networks, which are 
usually seen as robust under random failure or attack but appear to be especially 
vulnerable to targeted attacks. In a recent study of public transport networks of 14 
major cities of the world we have shown that these networks may exhibit scale-free 
behaviour [Physica A 380, 585 (2007)]. Our further analysis, subject of this report, 
focuses on the effects that defunct or removed nodes have on the properties of public 
transport networks. Simulating different attack strategies we elaborate vulnerability 
criteria that allow to find minimal strategies with high impact on these systems. 



1 Introduction 

A number of different phenomena related to complex networks [1] may be described 
in terms of percolation theory [2] . Take for example a network built following given 
construction rules. Then, how should the rules be tuned such that an infinite con- 
nected component is constructed with finite probability and what are the properties 
of this class of networks when the parameters reach the corresponding percolation 
threshold? Taken that percolation is in general seen as a critical phenomenon one 
may expect to find power laws in the vicinity of this point. The network (class) 
being described by more than one parameter, there are also many scenarios to cross 
the threshold exhibiting different behavior of the observables. Related questions are: 
how do infections spread on a network and are there optimal immunization strate- 
gies? These and similar questions are best formulated within percolation theory [2] 
generalized from its original formulation for regular grids to general network graphs. 

In this paper we intend to apply concepts of complex network theory [1] to 
analyze the behaviour of urban public transport networks (PTNs) under successive 
removal of their constituents. In particular, continuing our recent study of PTNs of 
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14 major cities of the world [3,4], we analyse their resilience against targeted attacks 
following different scenarios. 

It has been observed before that the behaviour of a complex network under an 
attack that removes nodes or links may drastically differ from that of regular lattices 
(i.e. from the classical percolation problem). Early evidence of this fact was found 
analysing real world scale-free networks: the www and the internet [5,6], as well as 
metabolic [7], food web [8], and protein [9] networks. In these studies, the interest was 
in the robustness of these networks subject to the removal of their nodes. It appeared 
that these networks display an unexpectedly high degree of robustness under random 
failure. However, if the scenario is changed towards "targeted" attacks, the same 
networks may appear to be especially vulnerable [10,11]. 

To check the attack resilience of a network, different scenarios of attacks have 
been proposed: e.g. a list of vertices ordered by decreasing degree may prepared 
for the unperturbed network and the attack successively removes vertices according 
to this original list [12, 13]. In a slightly different scenario the vertex degrees are 
recalculated and the list is reordered after each removal step [5]. In initial studies 
only little difference between these two scenarios were observed [11], however further 
analysis showed [14, 15] that attacks according to recalculated lists often turn out to 
be more harmful than the attack strategies based on the initial list, suggesting that 
the network structure changes as important vertices or edges are removed. Other 
scenarios consider attacks following an order imposed by different 'centralities' of 
the nodes, e.g. the so-called betweenness centrality [15]. In particular for the world- 
wide airport network, it has been shown recently [16, 17] that nodes with higher 
betweenness play a more important role in keeping the network connected than 
those with high degree. 

As it turns out, the behavior under attack of different real- world networks, even 
if they are scale-free differ considerably; e.g. computer networks behave differently 
than collaboration networks, see [15]. Therefore, it is important to investigate in 
how far the behaviour under attack of different real-world networks is consistent 
or shows strong variations. Below we present some results of our analysis for the 
PTNs of 14 major cities of the world (see Ref. [3] and chapter [4] of this volume 
for a detailed description of the included PTNs). A more complete survey will be a 
subject of a separate publication [18]. 

2 Observables and attack strategies 

In the analysis presented below we consider the PTNs of the following cities: Berlin 
(number of stations N = 2996, number of routes M = 218), Dallas (N = 6571, 
M = 131), Dusseldorf (N = 1544, M = 124), Hamburg (N = 8158, M = 708), Hong 
Kong (N = 2117, M = 321), Istanbul (TV = 4043, M = 414), London (N = 11012, 
M = 2005), Los Angeles (N = 46244, M = 1893), Moscow (N = 3755, M = 679), 
Paris (N = 4003, M = 232), Rome (N = 6315, M = 681), Sao Paolo (N = 7223, 
M = 998), Sydney (N = 2034, M = 596), Taipei (N = 5311, M = 389). This 
sampling includes cities from different continents, with different concepts of planning 
and different history of the evolution and growth of the city and its PTN. For the 
purpose of this paper let the PTN of a given city be given by the routes offered in this 
network. Each route services a given ordered list of stations. Representing the PTN 
in terms of a graph, we apply the following mapping: each station is represented 
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by a node; any two nodes that are successively serviced by at least one route are 
connected by a single link. We note that there are several other ways to represent 
a PTN as a graph [3,4, 19,20]. The particular representation that we use here is 
referred to as a L-space in Refs. [3,4,20]. 

The importance of a node i of a given network Af may be measured by calculating 
a number of graph theoretical indicators. Besides the node degree ki, which in our 
representation equals the number of nearest neighbours zi(i) of a given node i, 
different centralities of the node may be defined as follows (see e.g. [21]: 

closeness centrality Cc{i) = ^ — jr. — r, (1) 

Z^teAA «(M) 

graph centrality C G {i) = 1 jr. jr , (2) 

max teA /-«(i, t) 

stress centrality Cs(*) = <?st{i), (3) 

betweenness centrality Cb(i) = a st(n) 

In Eqs. (l)-(4), £(i,t) is the shortest-path length between a pair of nodes i,t that 
belong to a network Af, a st is the number of shortest paths between two nodes 
s,t € Af, and a s t{i) is the number of shortest paths between nodes s and t that 
go through the node i. When observing a network under attack we will also record 
the next nearest neighbours 22(2) and the clustering coefficient C(i) of all remaining 
nodes n. The latter is the ratio of the number of links Ei between the fc; nearest 
neighbours of i and the maximal possible number of mutual links between them: 

C{i) = 2 - E " (5) 

k%(J$i 1) 

Note that the mean values of all the above introduced quantities are well-defined 
for a connected network Af. However, some of the analysed PTNs consist of several 
disconnected components even before any perturbation is applied. Moreover, the 
number of components naturally increases when nodes are removed. Therefore, we 
restrict averages of the observables to the largest network component GCC C Af. 
We will indicate these averages by an over-line. Nevertheless, some of quantities are 
also well defined for the whole network, the corresponding average will be denoted 
by angular brackets. An example we note the inverse shortest path length: 

< rl > = ]v(F=l)g r1 ^ (6) 

where the summation spans over all N sites of the (possibly disconnected) network 
and defining i? _1 (i,j) = if nodes i, j are disconnected. Note that in this case (£} is 
obviously ill-defined. 

In what follows, we will pursue a number of different attack strategies or selection 
rules and criteria to remove the nodes (vertices). In particular, the scenarios are the 
following. "Random vertex" (RV): vertices (nodes) are removed in random order. 
"Random neigbour" (RN): one by one, a randomly chosen neighbour of a randomly 
chosen node is removed. This scenario appears to be effective for immunization 
problems [22] and it is based on the fact, that this way nodes with a high number 
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of neighbors will be selected with higher probability. In further scenarios nodes are 
removed according to the lists prepared in the order of decreasing node degrees 
(fc), centralities (C(C), C(G), C(S), C(B)), the number of their second nearest 
neighbours (2:2), and increasing clustering coefficient (C). The latter seven scenarios 
can be either implemented according to lists prepared for the initial PTN before the 
attacks (we indicate the corresponding scenario by a subscript i, e.g. d(C)) or the 
list is built by recalculating the order of the remaining nodes after each step. This 
way we follow sixteen different strategies in attacking the networks. The observed 
changes of the properties of the PTN under these attacks are described in the next 
section. 



3 Numerical results 

The theory of complex networks is concerned with the properties of ensembles of 
networks (graphs) that are characterized e.g. by common construction rules. Such 
an ensemble is said to be in the percolation regime if even the infinite graphs in 
this ensemble have a connected component that contains a finite fraction of their 
nodes. This component is referred to as the giant connected component GCC. If 
the ensemble properties are controlled by some parameter, e.g. the concentration of 
active nodes, then the percolation threshold in terms of this parameter is defined 
as the value at which the network ensemble enters the percolation regime. In the 
present case of finite networks we denote by GCC the largest connected component 
of a given network. For the finite networks defined by the PTN we analyze the 
behaviour of the their largest component that contains Ngcc nodes. We introduce 
the normalized largest component size S by: 

S = x 100%. (7) 

In Fig. 1 we show the behavior of S for the attack strategies described above for the 
PTNs of Dallas and Paris. At each step of the attack 1% of the nodes is successively 




Fig. 1. Attacks on PTNs of (a) Dallas and (b) Paris. Each curve corresponds to a 
different attack scenario as indicated in the legend, see text. Horizontal axis: percents 
of removed nodes, Vertical axis: normalized size S of the largest component. 
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removed following the selection criteria of the given scenarios. The effectiveness of 
the attack scenarios may be judged by their impact on the value of S. As it is 
clearly seen from Fig. 1, the least effective is the scenario of removing random nodes 
(RV): it is characterized by the slowest decrease of S. Another obvious conclusion 
is that scenarios based on lists calculated for the initial network (marked by a 
subscript i) appear to be less harmful than those, that are based on recalculated 
lists. Note however that the difference between 'initial' and 'recalculated' scenarios 
is less evident in the strategies based on the local characteristics, as e.g. the node 
degree or the number of second nearest neighbours (c.f. curves for k, ki and 22, z^i, 
respectively). The above difference is even more pronounced for the centrality-based 
scenarios. A principal difference between attacks on the highest degree nodes on the 
one hand, and on the highest betweenness nodes on the other hand is that the first 
quantity is a local, i.e. is calculated from properties of the immediate environment 
of each node, whereas the second one is global. Moreover, the first strategy aims 
to remove a maximal number of edges whereas the second strategy aims to cut as 
many shortest paths as possible. Our analysis shows that the most effective are 
those scenarios that are either targeted at nodes with the highest values of the 
node degree k, the betweenness centrality Cb, the next nearest neighbour number 
Z2, or the stress centrality Cs recalculated after each step of the attack. Figs. 2, 3 




c) d) 

Fig. 2. Four attack scenarios for different PTNs (with recalculation): attacks tar- 
geted at nodes of the highest (a) degree k, (b) number of second neighbours £2, (c) 
betweenness centrality Cb, or (d) stress centrality Cs- Vertical and horizontal axis 
as in Fig. 1. 
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show that the order of destructiveness of these scenarios differ for PTNs of different 
cities. However, among the scenarios analyzed so far these four appear to be the 
most effective ones. 

Another interesting quantity that we may deduce from Fig. 2 is the vulnerabil- 
ity of the network in terms of the level of destruction at which the largest network 
component breaks down. We observe that this is strongly correlated to the initial 
value of the so called Molloy-Reed parameter k = ~zij~z\ of the unperturbed network. 
Considering model networks that are randomly built from sets of nodes with given 
degree distributions it has been shown that the value of k c — 1 represents the perco- 
lation threshold in such networks [22,23]. A value much larger than k c then indicates 
a significant distance from the threshold. The values of this parameter for the PTN 
studied here are: Dallas (k = 1.28), Istanbul (1.54), Los Angeles (1.59), Hamburg 
(1.85), London (1.87), Berlin (1.96), Diisseldorf (1.96), Rome (2.02), Sydney (2.54), 
Hongkong (3.24), Sao Paolo (4.17), Paris (5.32), Moscow (6.24). Comparing in par- 
ticular with Fig. 2 a) we find indeed that the higher the initial k value the less 
vulnerable the network appears to be. 

To more precisely define the threshold region for the concentration of removed 
nodes we observe the behaviour of the maximal ^ max and mean £ shortest path 
lengths under attack, as shown in Fig. 3. We focus on the recalculated degree scenario 
(k). Both maximal and average path lengths display similar behaviour: initial growth 
and then an abrupt decrease when a certain threshold is reached. Obviously, removing 
the nodes initially increases the path lengths as deviations from the original shortest 
paths need to be taken into account. Further removing nodes then at some point 
leads to the breakup of the network into smaller components on which the paths 
are naturally limited by the boundaries which explains the sudden decrease of their 
lengths. For the PTN of Paris we observe that this threshold is reached for both Z max 
and £ at the same value of c scgm ~ 13%. The average shortest path on all components 
of the network, {£), also possesses a maximum in the same region (for the PTN of 
Paris it occurs at c ~ 13%). However, the values of c sogm differ for different cities 




a) b) 

Fig. 3. Highest degree scenario. Horizontal axis as in Fig. 1. (a) Behavior of the 
maximal and mean shortest path lengths for the Paris PTN calculated for the largest 
component (i miiX , £) and for the whole network (^ max ,f, < £ >f). Note a sharp 
maximum occurs at 13 % of removed nodes (stations) for £ max , £, ^ ma x,f • (b) Behavior 
of the maximal shortest path length f max for the PTNs of different cities. 
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(see Fig. 3,b) and obviously strongly depend on the attack scenario. 

As discussed the observed maximum in £ max (or in £) appears to be a suitable 
criterion to identify the values of c (or at least the region in c), where the segmen- 
tation of a network occurs. Other observables which resemble an 'order parameter', 
are the above described largest connected component size S, Eq.(7), or the average 
value of the inverse shortest path {I -1 ) (6) are less suitable for this purpose because 
of their rather smooth behaviour. In Fig. 4 we show for PTNs of fourteen cities the 
behavior of < l~ l > under attacks following the four most harmful scenarios, i.e. the 
recalculated highest k, Cb Z2 and Cs scenarios. Comparing the impact of different 
attacks scenarios (as seen in particular in Fig. 3, 4) one notices that the apparent 
relative impact strongly depends on the choice of the observable (e.g. S or < V 1 >). 

It is worth to note the statistical origin of the data exposed so far. Different 
instances of the same scenario may differ to some extent. This is obvious for the 
random RV or RN scenarios, where the nodes are removed according to a random 
procedure. However, it remains true even for the attacks following pre-ordered lists 
of nodes. Obviously, several nodes may have the same value with respect to a given 
characteristic (e.g. k, 22, or one of the centrality indices). Then, the choice between 
these nodes is random. To check the dispersion of the results, Figs. 5, 6 show the 
results of 10 complete attack sequences for the same scenario. Fig. 5 shows the 
change in the largest connected component S of the PTNs of Dallas (a) , Hongkong 
(b), and Paris (c) for the random vertex (RV) scenario. The scatter of the curves 
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a) b) c) 

Fig. 5. Impact and variance of the random vertex (RV) scenario on the normalized 
size S of the largest component for the PTNs of (a) Dallas, (b) Hongkong, and (c) 
Paris. Ten curves of different colour indicate different instances of the same scenario 
for each city. Vertical and horizontal axis as in Fig. 1 



in each figure provides an idea about the deviations between individual samples. 
The figures also clearly show that even attacked randomly, PTNs of different cities 
may display a range of different behaviour: from the comparatively fast decay of the 
largest connected component (as in the case of Dallas, Fig. 5a) to very slow, nearly 
linear decay (as in the case of Paris, Fig. 5c). 

The dispersion in the largest connected component size S is much less for se- 
quences of targeted attacks. In Fig. 6 we show the behavior of the largest cluster 
size and the maximal and mean shortest path lengths for the Paris PTN for ten 
complete attack sequences following the recalculated degree (k) scenario. Besides 
a rather narrow scattering of the data for S one notes, that within the current 
resolution the locations of the maxima in £ ma x and £ are very robust. 

To give an idea for the numerical values of different characteristics of the PTN 
as monitored during our analysis we display in Table 1 some data for the PTN of 
Paris for the recalculated degree scenario for some points of the sequence between 
the unperturbed network and the vicinity of the threshold (maximum of the shortest 
path lengths). 




a) b) c) 

Fig. 6. Ten instances of the recalculated highest degree scenario for the PTN of 
Paris, observing: a) the largest connected component size S, b) the maximal shortest 
path length f mal , c) the mean shortest path length I. Horizontal axis as in Fig. 1 
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Table 1. PTN of Paris during an attack sequence following the recalculated degree 
scenario, c: % of removed nodes; N: number of remaining nodes; k = ~z\\ mean 
node degree; Z2/Z1: ratio of the mean second to the mean first nearest neighbour 
number; £ max : maximal shortest path length; I: mean shortest path length; (£ ): 
mean inverse shortest path length (for all of the remaining network); Cc, Cg, Cs, 
C b'- mean closeness, graph, stress, and betweenness centralities; C: mean clustering 
coefficient; S: normalized largest component size. 
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4 Conclusions 

In this paper we reported on some results concerning the behavior of PTNs under 
attacks. Similar to other real- world and model complex networks [5-9,15], the PTNs 
manifest very different behaviour under attacks of different scenarios. With some 
notable exceptions they appear to be robust to random attacks but more vulnerable 
to attacks targeted at nodes with particular importance as measured by the values 
of certain characteristics (the most significant being the first and second neighbour 
numbers, as well as the betweenness and stress centralities). The observed difference 
between attack scenarios based on the initial and the recalculated distributions 
shows that the network structure changes essentially during the attack sequence. 
This is necessarily to be taken into account in the construction of efficient strategies 
for the protection of these network. 

As a suitable criterion to identify the level of resilience, i.e. the number of re- 
moved nodes that leads to segmentation it is useful to observe the behaviour of the 
maximal shortest path length £ m ax- For the majority of PTNs networks we have ana- 
lyzed here this observable displays a sharp maximum as function of the removed node 
concentration which indicates the breakup of the network. Other 'order-parameter- 
like' variables like the largest connected component size S or the average value of 
the inverse shortest path (i^ 1 ) arc less suitable for this purpose because of their 
smooth behaviour. Another observation is that in the recalculated highest-degree 
attack scenario for the segmentation often occurs at a value of k = Z2/Z1 ~ 1 (see 
e.g. Table 1 for Paris). Although the PTNs are correlated structures, the above 
estimate resembles the Molloy-Reed [23] criterion for randomly built uncorrelated 
networks. Further investigation is needed to understand the mechanisms that lead 
to higher resilience against random failure as observed e.g. for the Paris network 
and how this behavior is related to the network architecture. 

As mentioned in the introduction, there are different graph representations, also 
called 'spaces', for a given PTN [3,4, 19, 20]. These will also lead to different con- 
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nectivity relations and path lengths between nodes. The resilience of PTNs in these 
more general 'spaces' will be discussed elsewhere [18]. 
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